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ABSTRACT 


The popularity of social media has been increasing tremendously in recent 
times and thus cyberbullying towards people has also increased at an alarming 
rate. Many cyberbullying texts can be found in the comment sections of many 
well-known Bangladeshi social media personalities YouTube videos. It has 
the potential to cause severe emotional and psychological distress. Therefore, 


texts containing cyberbullying should be detected at the earliest stage and 
prevented from being displayed. In this study, we use natural language 


Keywords: processing (NLP) techniques and various machine learning classifiers and 
Cobetbullvin presented model for cyberbullying detection in Bangla and Romanized Bangla 
y u yng texts obtained from YouTube video comments. We developed our own 


Machine learning 
Natural language processing 
YouTube comments 


datasets using YouTube application programming interface (API) version 3.0. 
We collected 5000 Bangla comments, as well as 7000 Romanized Bangla 
comments from videos of different well-known social media personals. These 
two datasets, as well as a third dataset of 12000 texts which was the 
combination of the first two datasets were used to train the classifiers. These 
datasets were used to train machine learning classifiers after being 
preprocessed using NLP techniques. With an accuracy score of 76%, support 
vector machine (SVM) outperformed the other classifiers for the first dataset. 
The highest accuracy scores for the second and third datasets were 84% and 
80%, respectively, which were both achieved by multinomial naive Bayes. 
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1. INTRODUCTION 

One of the most common digital hobbies these days is spending time on social media sites such as 
Facebook, Instagram, Twitter, and YouTube [1]. Almost 3.6 billion people are using social media in 2020 
which has an escalation rate of 49% as of January 2020. On a daily basis, people spend an average of 
144 minutes on social media. This huge increase in the use of social media has its huge advantages as well as 
disadvantages. One of the most serious drawbacks is the rise of cyberbullying on various social media sites. 

Cyberbullying is described as the intentional and repeated infliction of harm through electronic 
media [2]. Over 80% of children owns a mobile phone and uses social network sites of which 57% admitted 
the experience of cyberbullying and also 60% children and young people have witnessed bullying on social 
media. This horrible experience undermines a person’s freedom to use online resources and also causes several 
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psychological effects [3]. Cyberbullying victims are 1.9 times more likely in committing suicide and 
furthermore endures cerebral problems like autism 75%, somatic faults 70% and learning complications 52%. 
This increase in cyberbullying has also escalated the necessity of prevention of cyberbullying. The detection 
of cyberbullying can be a significant maneuver in preventing cyberbullying. If the texts that contains 
cyberbullying can be detected at the earliest stage, they can be prevented from being commented. 

Machine learning based classification models can be of great ply in detecting cyberbullying. Over the 
years machine learning models have proved their efficiency in prediction and detection. There is a huge amount 
of research available which utilizes machine learning based prediction and detection. 

Bharat et al. [4] used machine learning algorithms for prediction of breast cancer and also diagnosed 
breast cancer using machine learning algorithms and Kaur and Kumari [5] classified diabetic and non-diabetic 
patients using machine learning approach. Farhana et al. [6] utilized deep learning approach to detect intrusion 
for packet and flow-based networks and in [7], again presented machine learning models for automated traffic 
classification and application identification. Also, Hossain et al. [8] used machine learning algorithms to 
predict rating of product reviews and in [9], the authors proposed a method of tracking and detecting vehicles 
from real time video streaming using blob tracker algorithm. 

There are also many researches on text-based machine learning classification methods like 
Ikonomakis et al. [10] used machine learning techniques to conduct text classification. Bo1y and Moens [11] 
used machine learning to evaluate sentiment in English, Dutch, and French texts. These kinds of text-based 
classification models can be a great use in classifying cyberbullying texts form regular texts. Similar kind of 
work was presented by Haidar et al. [12] where the detected cyberbullying from Arabic and English texts using 
machine learning models and in [13], cyberbullying from twitter of Spanish language was detected using 
machine learning approach. Similarly, Greevy and Smeaton [14] developed a system to detect racism using 
machine learning techniques. There is also research on bullying detection on Bangla texts where Al-Mamun 
and Akhter [15] proposed machine learning based approach. 

The remaining paper is laid out as follows: section 2 includes several works that are relevant to our 
study. The methodology is presented in section 3. The findings are discussed in section 4, and the conclusion 
and future work can be found in section 5. 


2. RELATED WORKS 

As many researchers are working hard to detect cyberbullying in several languages, there are some 
previous researches available this field. In this section, we will discuss about some of the works that are relevant 
to our studies. Haidar et al. [12] proposed a solution for detecting cyberbullying using machine learning. In 
their research, they used both English and Romanized texts and support vector machine (SVM) had the highest 
overall precision (93.4%). Using SVM and naive Bayes classifiers, Dalvi et al. [16] proposed a machine 
learning model to identify and eliminate cyberbullying. The data was obtained from Twitter through the Twitter 
application programming interface (API). SVM had a higher accuracy of 71.25% in their analysis than naive 
Bayes, which had a 52.70% accuracy. 

There are also several other research for cyberbullying detection, such as Paredes et al. [13] retrieved 
Spanish texts from Twitter and achieved a 93% accuracy rate using machine learning algorithms. 
Banerjee et al. [17] introduced a novel deep neural network approach for cyberbullying detection, and the CNN 
method received a maximum of 93.97% testing accuracy. Ali and Syed [18] also using machine learning 
techniques. In their research, they used three datasets and SVM had the highest average accuracy of 80%. We 
were inspired by these excellent efforts of cyberbullying detection in Bangla and Romanized Bangla texts. 

Machine learning is also utilized in Bangla Cyberbullying detection domain. Mamun and 
Akhter [15] suggested using machine learning to detect cyberbullying in Bangla text. They collected 2400 
status from Facebook and Twitter and applied machine learning algorithms in two phases. Their highest 
accuracy was 97.27% accuracy and it was gained by SVM. 

Chakraborty and Seddiqui [19] used machine and deep learning to classify Bangla texts, with 
SVM performing best with 78% accuracy. Similarly, a maximum of 72% accuracy was achieved by 
Ahammed et al. [20]. They gathered their Bengali data from Facebook. These works for the detection of 
cyberbullying in Bangla motivated us to work with Bangla data collected from YouTube. Also, there is a very 
few works available which used Romanized Bangla texts like Tripto and Ali [21]. Their research classified 
sentiment of Bangla, Romanized Bangla and English texts collected from YouTube [21]. They used long short- 
term memory (LSTM), convolutional neural network (CNN), naive Bayes and SVM and showed an accuracy 
of 65% for LSTM. Similarly, Hassan et al. [22] used Bangla and Romanized Bangla texts. Using these texts, 
they trained a deep recurrent model which gave them a highest of 78% accuracy. Because the number of works 
for Romanized Bangla texts 1s minimal, we decided to conduct our research using Romanized Bangla texts 
collected from YouTube. 
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3. METHODOLOGY 
3.1. Workflow 

Using natural language processing techniques and machine learning classifiers, we aim to identify 
cyberbullying texts obtained from YouTube video comment sections. Throughout this research, a total of three 
datasets were used. The datasets were preprocessed using natural language processing (NLP) techniques and 
then were used to train the machine learning classifiers. Finally, the performance analysis was performed in 
terms of accuracy, precision, recall, fl-score and area under the curve of receiver characteristic operator 
(AUC-ROC) curve. Figure 1 depicts the proposed methodology. 
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data 
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Multinomial Naive 
Bayes 


- Support Vector Machine 
- Logistic Regression 
- XGBoost 
Dataset 1 — (5000 Bangla texts) 
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Dataset 3 — (Dataset 1 + Dataset 2) 


Preprocessing and 
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Figure 1. Proposed methodology 


3.2. Dataset 

The most important phase of our research is the collection of data. For this very purpose, we collected 
data from YouTube. For this, we utilized the YouTube API. The videos, which included a few well-known 
social media personalities from Bangladesh, were hand-picked. Bangla and Romanized Bangla texts were 
included in the texts. The texts were divided into two datasets. There were 5000 Bangla texts in Dataset 1 and 
7000 Romanized Bangla texts in Dataset 2. After that, the first two datasets were combined to create a new 
dataset with a total of 12000 texts. Following that, we annotated all the datasets into 2 categories: bullying and 
non-bullying. Some of the annotated data is presented in Table 1. 


Table 1. Sample of annotated data 


Texts Language Label 
POA Wiel TST PCA A fg fe (The foul woman is not ashamed) Bangla 1 (Bullying) 
DUCT NA COD SPS AT .... 3 IAA ARTA Bangla 1 (Bullying) 
(I don’t even know who you are. You are a horrible singer.) 
AOSA BCS SAT AAT SIS AIS IAPC AMANAN SIC APS Bangla 0 (Not Bullying) 
(Truly speaking Omor Sani brother is an awesome man.) 
WIM ANAC, ANÈ BIAS [AT MYN, ANÈ BCA BANN I Bangla 0 (Not Bullying) 
(It’s good, everyone is my favourite, having good quality. ) 
3rd class qualityr 2 person (Both of them are third class quality persons) Romanized 1 (Bullying) 
Sob gula hijra magir chaoaal. (All of them are bastards) Romanized 1 (Bullying) 
Vlo laglo (Feels good.) Romanized 0 (Not Bullying) 
Bachaa digitake beshi valo lage (I like childhood dighi even more.) Romanized 0 (Not Bullying) 
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3.3. Preprocessing 

We started the preprocessing by removing any duplicate data from our datasets. All three datasets 
were then stripped of digits, emoticons, punctuation marks, links, user tags, uniform resource locator (URL)’s, 
elongated words and user mentions. Some of the texts consisted of both Bangla and Romanized Bangla which 
were removed from the dataset in order to obtain reliable results. Also, we did not perform any stop word 
removal, stemming on our datasets since the texts were mainly in local language. 


3.4. Feature extraction 

We used term frequency-inverse document frequency (TF-IDF) to extract features from the datasets. 
TF-IDF a powerful feature extraction technique which identifies important words in textual data [23]. It 
transforms strings into numerical values, allowing machine learning classifiers to use them. The number of times 
a word appears in a document divided by the total number of words in the document yields term frequency (TF). 


Frequency of a particular word in the document 


F= - (1) 
Total number of words in the document 
IDF identifies the weights of essential words in a document. It is measured using (2). 
Total d 
IDF = log, ( ota ocuments ) (2) 


Documents witha particular term 


Finally, both the term frequency and inverse document frequency can be multiplied to obtain the TF-IDF which 
will have normalized weights. It is calculated with (3). 


TF — IDF =TF «IDF (3) 


3.5. Machine learning classifiers 

Machine learning classifiers are widely utilized to predict categorical data. Today machine learning 
is used to build different intelligent systems that makes decission making easier. Machine learning is a broad 
term that encompasses supervised, unsupervised and reinforcement learning. In this research, we used four 
supervised machine learning classifiers. 


3.5.1. Multinomial naive Bayes 

The multinomial naive Bayes algorithm is a probabilistic learning method popular in NLP. The 
algorithm predicts using the Bayes theorem [24]. It calculates probability for a given sample and outputs the 
value with the highest probability using (4). 


P(c|x) = P(x|c) * P(c) / P(x) (4) 


3.5.2. Support vector machine (SVM) 

Support vector machine (SVM) is vastly used for classification problems. It classifies data by 
generating a decision boundary or hyperplane in an n-dimensional space [25]. To choose the best plane among 
numerous possible planes, the value that has the highest margin is chosen. It has an edge over other classifiers 
because to its faster processing speed and greater performance with less samples. 


3.5.3. Logistic regression 

For binary classification, logistic regression is a commonly used classifier. To classify data, logistic 
regression uses a sigmoid function. The function converts any real value between O to 1 [26]. The sigmoid 
function is shown in (5). 


Sz) = — 


1+e~4 


(5) 


The values that the function returns, is converted into O or 1. To do so, a threshold value is set. The values 
above the threshold value are classified as class 1 and below are classified as class 0. 


3.5.4. XGBoost 

XGBoost is an ensemble of decision trees [27]. It is a machine learning classifier that uses a gradient 
boosting algorithm. XGBoost is known for its faster execution speed and higher model performance. XGBoost 
is extremely useful for achieving good results with minimal resources and time. 
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3.6. Performance evaluation 

To analyze the performance of any qualified machine learning classifier, performance evaluation is 
critical. We considered confusion matrix, precision, recall, fl-score, accuracy and AUC-ROC curve [28], [29] 
for performance evaluation. We also showed how many predictions were correctly or incorrectly done by the 
classifiers. 

The confusion matrix is a very important performance evaluation parameter. It is a combination of for 
distinct actual and predicted values. Confusion matrix plays a very vital role in computing accuracy, precision, 
recall, f1-score and the AUC-ROC curve. The ratio of accurate predictions to the total number of input samples 
determines the accuracy rate [28] and is calculated using (6). 


TP+FN 


Accuracy = ———————_- 
y TP+TN+FP+FN 


(6) 
The number of accurate positive predictions divided by the total number of positive predictions made by a 
classifier yields the precision value [28] and it is calculated using (7). 


Precision = —— (7) 
TP+FP 
The number of accurate positive predictions divided by the total number of actual positive samples yields the 
recall value [28]. It is calculated using (8). 


TP 
TP+FN 





Recall = (8) 
The harmonic mean of precision and recall is the fl-score [28]. Better output is associated with a higher 
fl-score. 


Precision»Recall 
F1 — score = 2 x ————_—_ (9) 


Precision+Recall 


The AUC-ROC curve tells us how good a model is at distinguishing between classes [29]. A higher AUC 
indicates that the model is better at prediction. 


4. RESULTS AND DISCUSSION 

We divided our datasets into 80% for training and 20% for testing. After training the classifiers with 
80% data, we used the 20% testing sets to evaluate performance. Table 2 displays the total number of correctly 
and incorrectly recognized instances for each classifier across all the datasets. Figure 2 shows that SVM 
correctly identified 757 instances 75.7% in Dataset 1. multinomial naive Bayes has the highest number of 
correctly classified instances in Datasets 2 and 3, with 1180 84.28% and 1928 80.33% respectively. With the 
greatest number of correctly classified instances, SVM outperformed all other algorithms in Dataset 1. 
Similarly, multinomial Naïve Bayes stood out most for Dataset 2 and Dataset 3. Table 2 also shows the 
confusion matrix of the best performing algorithms for each dataset. 

From Table 2, it can be seen that 363 cyberbullying texts and 394 non-cyberbullying texts of 
Dataset 1 are classified correctly by SVM for Dataset 2, 661 cyberbullying texts and 519 non-cyberbullying 
texts are classified correctly by multinomial naive Bayes. For Dataset 3, 1086 cyberbullying texts and 842 
non-cyberbullying texts are classified correctly. Table 3 shows the precision, recall and fl-score of all 
algorithms for all datasets in details and Figure 3 shows the accuracy of all algorithms. 

Table 3 and Figure 3 show that for Dataset 1, SVM achieved precision, recall and fl-score of 0.76 
each, as well as an overall accuracy of 76%, the highest of all the algorithms. For Dataset 2, multinomial naive 
Bayes achieved precision, recall and f1-score of 0.84 each, as well as overall accuracy of 84%, the highest of 
all the algorithms. Finally, multinomial naive Bayes again outperformed all other algorithms for Dataset 3 by 
achieving precision of 0.81, recall and fl-score of 0.80 each, as well as an overall accuracy of 80%. 

Another performance analysis is the ROC area. The larger the ROC area, the more accurately a model 
can identify instances. Figure 4 shows the ROC curve of SVM for Dataset 1 as well as the ROC curves of 
multinomial Naive Bayes for Dataset 2 and 3 as these two algorithms performed best among all four algorithms. 
As shown in Figure 4, it is clear that the highest performing algorithm is multinomial naive Bayes. It performs 
best for Dataset 2 and 3. It also performs reasonably well for Dataset 1 but was outperformed by SVM. 
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Figure 2. Number of correctly and incorrectly classified instances: 
(a) number of correctly classified instances and (b) number of incorrectly classified instances 


Table 2. Confusion matrix of best performing algorithms for each dataset 


Dataset Algorithm TP FP FN TN 
Dataset 1 SVM 363 141 102 394 


Dataset 2 Multinomial Naive Bayes 661 78 142 519 
Dataset 3 Multinomial Naive Bayes 1086 158 314 842 


Table 3. Precision, recall and Fl-scores by class of all algorithms for all datasets 


Datasets Algorithm Precision Recall _ Fl-score 
Dataset 1 Multinomial Naive Bayes 0.74 0.74 0.74 
SVM 0.76 0.76 0.76 
Logistic Regression 0.76 0.75 0.75 
XGBoost 0.75 0.74 0.74 
Dataset 2 Multinomial Naive Bayes 0.84 0.84 0.84 
SVM 0.83 0.83 0.83 
Logistic Regression 0.83 0.83 0.82 
XGBoost 0.80 0.78 0.78 
Dataset 3. Multinomial Naive Bayes 0.81 0.80 0.80 
SVM 0.79 0.79 0.79 
Logistic Regression 0.79 0.79 0.79 
XGBoost 0.76 0.74 0.74 


TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 1, February 2022: 89-97 


TELKOMNIKA Telecommun Comput El Control o 95 


Accuracy 


74% 





Dataset 2 
Datasets 


OMultmomial Naive Bayes BSVM gp Logistic Regression jfwXGBoost 


Figure 3. Accuracy of all the classifiers 
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Figure 4. ROC curves for the best performing algorithms 


5. CONCLUSION AND FUTURE WORK 

With the increase use of different social media sites people are interacting with each other more often 
which has also brought an increase in the amount of cyberbullying. To detect these cyberbullying texts, we 
developed models based on NLP techniques and machine learning. We collected a total of 12000 texts from 
YouTube and prepared three datasets. These datasets were used to train the models. After testing two 
algorithms stood out most. SVM performed best for Ist dataset with accuracy of 76% and multinomial naive 
Bayes produced best results for 2nd and 3rd dataset with accuracy of 84% and 80%. SVM also obtained 
precision of 74%, recall 75% and fl-score 74% for first dataset. For the second dataset multinomial naive 
Bayes got 85% precision, 84% recall and 84% f1-score and for the third dataset multinomial Naive Bayes got 
81% precision, 80% recall and fl-scores. The other two algorithms, logistic regression and XGBoost also 
performed reasonably well but was slightly outperformed by SVM and XGBoost. These trained models can be 
used to detect Bangla and Romanized Bangla cyberbullying texts of YouTube at an early stage and stop them 
being commented. In the future we want to work with more data and also more videos of different categories. 
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